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ABSTRACT The Millennium Run is the largest simulation of the formation of 
structure within the ACDM cosmogony so far carried out. It uses 10 10 particles to 
follow the dark matter distribution in a cubic region 500/j _1 Mpc on a side, and has 
a spatial resolution of 5 /i^kpc. Application of simplified modelling techniques to 
the stored output of this calculation allows the formation and evolution of the ~ 10 7 
galaxies more luminous than the Small Magellanic Cloud to be simulated for a va- 
riety of assumptions about the detailed physics involved. As part of the activities of 
the German Astrophysical Virtual Observatory we have used a relational database to 
store the detailed assembly histories both of all the haloes and subhaloes resolved 
by the simulation, and of all the galaxies that form within these structures for two 
independent models of the galaxy formation physics. We have created web applica- 
tions that allow users to query these databases remotely using the standard Structured 
Query Language (SQL). This allows easy access to all properties of the galaxies and 
halos, as well as to the spatial and temporal relations between them and their en- 
vironment. Information is output in table format compatible with standard Virtual 
Observatory tools and protocols. With this announcement we are making these struc- 
tures fully accessible to all users. Interested scientists can learn SQL, gain familiarity 
with the database design and test queries on a small, openly accessible version of 
the Millennium Run (with volume 1/512 that of the full simulation). They can then 
request accounts to run similar queries on the databases for the full simulations. 
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1 The Millennium Run 



The last few years have seen the establishment of a standard model for the origin and 
growth of structure in the Universe, the so-called ACDM cosmogony. In this model 
small density fluctuations are generated during an early period of cosmic inflation 
and first become directly observable on the last scattering surface of the Cosmic Mi- 
crowave Background when the Universe was about 400,000 years old. Since this 
time the fluctuations have grown steadily through the gravitational effects of a dom- 
inant Dark Matter component composed of some weakly interacting particle yet to 
be detected directly on Earth. As fluctuations become nonlinear, larger and larger 
objects collapse, giving rise to the galaxies and galaxy clusters we see today. This 
process has recently been modified as Dark Energy has come to dominate the cosmic 
energy density, accelerating the cosmic expansion and reducing the rate of structure 
growth. A major effort is currently underway, testing this paradigm and measuring 
its parameters. A parallel effort explores galaxy and cluster formation in this model 
in order to understand the physical processes which shaped observed systems. 

The Millennium Run, completed in summer 2004 at the Max Planck Society's su- 
percomputer centre in Garching, is part of the programme of the Virgo Consortium 1 
and is intended as a tool to facilitate this second effort. It uses 10 10 particles of mass 
8.6 x 10 8 /j _1 Mq to follow the evolution of the dark matter distribution within a cubic 
region of side 500/j _1 Mpc from z = 127 until z = 0. The cosmological parameters 
assumed are Cl m = Q. dm + Q. h = 0.25, Q. h = 0.045, £2 A = 0.75, h = 0.73, G 8 = 0.9 
and n = 1 with standard definitions for all quantities. The initial density fluctua- 
tions correctly account for the oscillatory features introduced by the baryons, but the 
simulation follows the dark matter only, supplementing the mass of the simulation 
particles to account approximately for the neglected baryons. 

The simulation was carried out using a modified version of the publicly available 
code GADGET-2 (iSpringell 120051) . The positions and velocities of all simulation 
particles were stored at 63 times spaced approximately logarithmicall y from z = 20 



to the present day. For each of these dumps the algorithm SUB FIND ( Spri ngel et al. 



2001) was used to identify all self-bound halos containing at least 20 particles and 
all self-bound subhalos within these halos down to the same mass limit. Merger trees 
were then built linking each halo and its substructures at the final time to the objects 
at earlier times from which they formed. These trees are the input to the final stage 
of post-processing. This simulates the formation of the galaxies in all or a part of the 
volume by following simplified treatments of the baryonic physics within each tree, 
starting at early times and integrating down to z = 0. More d etailed descriptions o f 



the simulation itself and of this post-processing can be found in Springe l et al.l (l2005). 

Several different galaxy formation models have already been imple mented on this 
structu re by the Garching and Durham groups. The model used in Springel et all 



(2005) to present some initial clustering and evolution results is essentially i dentica l 



to that described a nd explored in c onsiderably more detail by ICroton et al. I J2006ch . 



The model used by De Lucia et al. (2006) to study elliptical galaxy evolution is sim 



ilar in most aspects, but differs in its treatment of feedback from star formation. In 



http://www.virgo.dur.ac.uk 
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their study of brightest cluster galaxies De Lucia & Blaizol ( 2006h use a model with a 
different assumed IMF for star formation, an i mproved scheme for tracking halo cen- 
tral galaxies, but the same feedback scheme as lCroton et all (l2006d). T he model in- 
dependently developed in Durham and presented in Bower et al.l d2006^ differs from 
these Garching models in many ways. The scheme for building merger trees from 
the halo/subhalo data is different in detail, as are many of the modelling assumptions 
made to deal with the baryonic physics, most notably, perhaps, those associated with 
the growth of and the feedback from supermassive black holes in galaxy nuclei. In 
this public release w e are initially making availa ble the galaxy populatio ns produced 
by the models of the be Lucia~& BlaizotT <l2006h and lBower et all J2006h papers. 

The data on the halo/subhalo and galaxy populations which have been produced 
by this effort can be used to address a very wide range of questions about galaxy 
and structure evolution in the now standard model. In the 13 m onths since Na- 
ture published the first Millennium Run paper ( Springel et al.ll200^ a further 24 pa- 
pers have appeared on the preprint server using data derived fro m the simulation . 
Some of these are concerned with issues of dark matter structure dGao et alJl2005l : 



Har ker et al.l2006l : lGao & WhitehOQffl . Others build and test galaxy formation mod- 
els, exploring the requireme nts for r eproducing va rious aspects of the observed prop- 
erties of galaxies and AGN JCroton et al!2006cllbl: Sower et al.ll2006l: be Lucia et al 



erties or galaxies and AUJN ( croton et al 
2006]; ICrotonlbOOd : l\Vang et al.ll2006l: IPt 



2006). 



De Lucia & Blaizot 2006). Yet others concen 



trate on aspects of large-scale struc ture and galaxy clustering ( Croton et al. 2006a; 
iNoh & Lee 2006; Lee & Park 2006) and on cluster structure and gravitational lensing 
jHa^ashi & WhiteJ|20J}6j; [Moeller et al. 2006|; IWeinmann et ahl EoOril: iNatarajan et all 



200- 



Run e 


is a point of com 


parison in primarily observational papers ( Kauffmann et al. 


2006; 


Patiri et al. 2006; Einasto et al. 2006; Rudnick et al. 2006; 


Bernardi et al. 2006; 


Conrov et al.ll2006l: iLi et al.ll2006l). Finally, the data have been used to illustrate the 



2006). The goal of this release is to facilitate further such use of Millennium Run 



data products by making them conveniently and publicly available over the Web. 



2 The Databases 

The stored raw data from the Millennium Run, the positions and velocities of all 10 10 
particles in the initial conditions and at each of the 63 later output times, have a total 
volume of almost 20TB. This is so large that general public access and/or manip- 
ulation over the internet is not currently a viable possibility. As a result, projects 
which require access to the full particle data (e.g. ray-tracing projects for gravita- 
tional lensing applications) are only practicable in collaboration with Virgo scientists 
in Garching or Durham, where copies of the full data are stored. The Virgo Consor- 
tium welcomes suggestions for such joint projects and will try to accommodate them 
as far as they overlap with the interests of scientists at one of the Virgo institutions 
and do not conflict with existing projects. 

Many projects, however, including the great majority of those listed at the end 
of §1, can be carried out using products from our Millennium Run post-processing 
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pipeline. Only 0(13OGB) are needed to store the information provided by our halo- 
subhalo analysis including the tree structure which describes the assembly history of 
all objects. A database with the corresponding galaxy information from one of our 
galaxy formation simulations is roughly twice as large because of the larger number 
of objects and the larger number of attributes ascribed to each. The variety of these 
attributes and the complexity of the relations between them motivate the use of rela- 
tional databases, whose query engines allow complex questions to be phrased in the 
standard Structured Query Language (SQL) and executed in optimal fashion. These 
tools have become standard in the Virtual Observatory community as a means for pro- 
moting efficient and user-friendly data-mining within large observational databases. 
A major task for the German Astrophysical Virtual Observatory (GAVO) has been to 
adapt these tools for large theoretical (simulation) databases, where issues of format 
and quality control are less difficult than for observational archives, but where rela- 
tionships can be considerably more complex, primarily because of the addition of the 
time dimension. 

The relational database structure and the online query interface which we have set 
up for Millennium Run data products are modelled closely on the relevant parts of the 
very successful SkyServer system set up for the Sloan Digital Sky Survey. 2 Within 
such a database information is stored in the form of tables where rows correspond 
to individual objects and columns to attributes of those objects (e.g. position, ve- 
locity, mass, angular momentum, size, flattening, type , luminosity, colour, indices 
specifying relations to other objects...). The web interface allows users to formulate 
their scientific questions as SQL queries operating on these tables in a relatively sim- 
ple way and to submit them remotely over the internet for execution on the database 
server (located at present in Garching, with a mirror to be set up in the near future 
in Durham). The subsequent search of the database is optimised as far as possible 
for "typical" queries. Results are returned to the user over the internet as tables in 
one of a number of formats which can then be fed into standard VO or other graphics 
packages or can be further manipulated by users with their own software. 

The entry point for our public release of Millennium products is 
http://www.mpa-garching.mpg.de/Millennium 

This top page gives a brief introduction to the Millennium Run as well as links to 
images and movies, to papers which have used the data, and to the pages on the 
GAVO site which describe in detail the database structure, the SQL and the proce- 
dures for accessing and downloading data. Data on the so-called "milli-Millennium" 
run (hereafter milli-M) can be accessed from these pages immediately. This is a sim- 
ulation which is identical to the main run in all aspects except that it is carried out 
in a cubic region of side 62.5/j~ 1 Mpc. It thus has 1/512 of the physical volume and 
its databases are 512 times smaller than those of the Millennium Run itself. Any 
query can be executed on the milli-M databases directly from this page, provided it 
executes on the host computer in less than 30 seconds. The first 10,000 lines of any 
output are returned to the user. These restrictions are intended to avoid inadvertently 
tying up the host or the internet link while developing SQL expertise. 

Once users can execute their queries efficiently on the milli-M databases, they 

2 http://cas. sdss.org/dr5/en/ 
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can apply for password-protected accounts as specified on the web-page in order to 
carry out the corresponding queries on the main database. This two-stage system is 
intended to allow monitoring of usage patterns and to prevent accidental abuse by 
inexperienced users. 

At present databases are accessible for the ( sub)halo data and fo r galaxy data from 
the models of De Lucia & Blaizot (2006) and Bower et al. (2006). identical models 
and attribute lists are used for the milli-M and full Millennium versions of these 
databases. Further galaxy models and further data products may be added as they are 
generated. Examples of scientific queries that can be formulated simply and executed 
efficiently with the present SQL engine include: 

• Find all halos (or galaxies) in a given part of the simulation at a given time and 
in a given mass range 

• Find all companion halos (or galaxies) in some given range of separations from 
this previous set of objects 

• Find the number of galaxies at a given time in each of a series of narrow lumi- 
nosity bins (e.g. the luminosity function) 

• Find the number of galaxies in high mass halos at a given time in such luminos- 
ity bins (e.g. the cluster luminosity function) 

• Find all resolved progenitor halos at redshift 3 of high-mass z = halos 

• Find all galaxies at redshift 3 which are progenitors of the central galaxies of 
high-mass z = halos 

• Find the halo masses of all lO n M galaxies at z = 3 and determine the fraction 
which are central galaxies of these halos 

• Find the z = descendents of all redshift 3 galaxies with stellar mass above 
lO n M or with star formation rate above lOM /yr 

• Find all halos (or galaxies) which have undergone a major merger since the 
previous stored output time 

Clearly this capability allows a very broad range of scientific issues to be addressed 
in a straightforward way. Some of these are currently being studied by scientists 
associated with the Virgo Consortium, but we hope that this release will encourage 
others to use the exceptional statistics provided by the Millennium Run to explore 
how galactic and dark matter structures evolve in the current ACDM paradigm. In 
particular, closer comparison with a wide range of observational data should indicate 
how our present simple models for the formation of galaxies and AGN need to be 
modified to correspond better with reality. 
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